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Abstract 

It  is  well  known  that  most  of  the  standard  specification  tests  are  not  robust  when 
the  alternative  is  misspecified.  We  consider  the  three  types  of  typical  misspecification 
encountered  in  econometric  model  specification  testing,  namely,  complete  misspecifica- 
tion, underspecification,  and  overspecification.  In  the  case  of  complete  misspecification 
the  distribution  under  the  alternative  hypothesis  does  not  belong  to  the  data  generating 
process  (DGP),  while  underspecification  refers  to  the  alternative  being  a  subset  of  a  more 
general  model  representing  the  DGP.  Overspecification  is  the  case  when  the  alternative 
hypothesis  is  overstated.  Most  likely,  the  first  two  types  of  misspecification  are  common  in 
one-directional  testing  situation  whereas  the  last  one  happens  when  multi-directional  joint 
tests  are  applied  based  on  an  overparametrized  alternative  model.  Following  Haavelmo's 
work,  we  provide  a  simple  example  to  illustrate  the  effects  of  misspecification  on  test- 
ing economic  hypothesis.  Then  we  find  the  asymptotic  distributions  of  standard  one- 
directional  and  multi-directional  Lagrange  multiplier  (LM)  tests  under  these  three  kinds 
of  misspecification.  Next  using  these  distributions,  we  suggest  a  robust  specification  test 
under  misspecified  alternatives.  The  new  test  is  shown  to  be  asymptotically  equivalent  to 
Neyman's  C(a)  test.  Some  applications  are  presented  to  illustrate  our  theoretical  results. 


1.  Introduction 

Econometricians'  concern  with  problems  which  arise  when  the  alternative  hypothesis 
used  to  construct  a  test  deviates  from  the  data  generating  process  (DGP)  goes  way  back  to 
Haavelmo  (1944).  In  this  pioneering  work,  "The  Probability  Approach  in  Econometrics", 
while  discussing  the  problem  of  testing  economic  relations,  he  stated  that  (pp.  65-6) 

"whatever  be  the  principles  by  which  we  choose  a  "best"  critical  region  of  size 
a,  the  essential  thing  is  that  a  test  is  always  developed  with  respect  to  a  given 
fixed  set  of  possible  alternatives  ft0.  If,  on  the  basis  of  some  general  principles,  a 
"best"  test,  or  region,  Wq  say,  is  developed  for  testing  a  given  hypothesis  P  £  u° 
with  respect  to  a  set,  ft0,  of  a  priori  admissible  hypotheses,  and  if  we  shift  the 
attention  to  another  a  priori  admissible  set,  ft',  also  containing  u°,  the  same 
general  principle  will,  usually,  lead  to  another  "best"  critical  region,  say  W{f .  In 
other  words,  if  a  test  is  developed  on  the  basis  of  a  given  set  of  a  priori  admissible 
hypotheses',  ft0,  the  test  is,  in  general,  valid  only  for  this  set,  ft0". 

In  testing  any  economic  relations,  specification  of  the  priori  admissible  hypotheses,  ft0,  is 
of  fundamental  importance.  As  stated  in  Haavelmo  (p.  66),  we  define  non-robustness  of  a 
test  in  the  following  way.  A  test  is  not  robust  if  there  exist  some  alternatives  in  ft'  such 
that  the  test  has  poor  power  with  respect  to  those  alternatives,  where  ft'  may  be  obtained 
by  extending  (or  changing)  ft0  to  include  new  (or  different)  alternatives. 

Very  often  it  will  be  difficult  to  interpret  the  results  of  a  test  applied  to  a  misspecified 
model.  For  example,  while  testing  the  significance  of  some  of  the  regression  coefficients 
in  the  linear  regression  models,  the  results  are  not  easily  interpretable  when  a  nonlinear 
model  is  the  appropriate  one  [see  White  (1980),  Bera  and  Byron  (1983),  and  Byron  and 
Bera  (19S3)].  This  is  due  to  the  fact  that  under  the  linear  regression  model  the  "allowable" 
alternatives  include  only  the  system  of  regression  equations  of  the  same  (linear)  form,  but 
with  regression  coefficients  that  are  different  from  zero  [see  Haavelmo  (p.  66)]. 

Typically,  the  alternative  hypothesis  may  be  misspecified  in  three  different  ways.  The 
first  is  what  we  shall  call  "complete  misspecification".    In  this  case,  the  set  of  assumed 
alternatives,   ft0,   and  the  DGP,  ft'  say,   are  mutually  exclusive,  i.e.,   (ft0  -  w° )  D  (ft'  - 
u>  )   =  0.     This  happens,  for  instance,  if  one  tests  serial  independence  when  the  DGP 


has  heteroskedastistic  disturbances  with  no  serial  dependence.  In  the  second  case  the 
alternative  is  underspecified  in  that  it  is  a  subset  of  a  more  general  model  representing  the 
DGP,  i.e.,  S7°  C  0.' .  This  leads  to  the  problem  of  "undertesting"  which  one  has  to  guard 
against  when  "one-directional"  tests  [or  "fewer- directional"  tests  than  actually  required] 
are  performed.  The  last  one  is  "overtesting"  which  results  from  overspecification,  that  is, 
when  Q°  D  n'.  This  is  more  likely  to  be  the  case  when  "multi-directional"  joint  tests 
are  applied  based  on  an  overparametrized  alternative  model.  [For  a  detailed  discussion 
of  the  concepts  of  undertesting  and  overtesting,  see  Bera  and  Jarque  (1982)].  In  both 
undertesting  and  overtesting  some  loss  of  power  is  to  be  expected. 

Most  of  the  literature  about  the  robustness  of  specification  tests  adresses  these  issues 
in  one  way  or  another.  Given  the  popularity  of  one-directional  specification  tests,  many  re- 
searchers have  paid  attention  to  the  non-robustness  of  these  tests  under  complete  misspec- 
ification  or  underspecification.  Bera  and  Jarque  (1982)  reported  some  Monte  Carlo  results 
on  the  estimated  power  of  some  of  the  well  known  one-directional  and  muti-directional 
specification  tests  under  different  kinds  of  misspecification  (see  also  the  references  cited 
therein  for  other  related  research).  On  the  basis  of  their  Monte  Carlo  results,  Bera  and 
Jarque  (p. 71)  concluded  that  undertesting  resulted  in  considerable  loss  of  power  while  the 
effect  of  overtesting  was  not  that  severe.  Godfrey  (1988,  p. 79),  Pagan  and  Wickens  (1989 
p.  993),  and  Pagan  (1990)  highlighted  the  importance  of  this  issue  and  Wooldridge  (1990) 
developed  some  robust,  regression  based  specification  tests.  In  recent  papers,  Davidson  and 
MacKinnon  (1985,  1987)  and  Saikkonen  (1989)  provided  some  analytical  treatment  of  the 
problem.  They  derived  the  asymptotic  distributions  of  the  three  classical  test  statistics — 
the  likelihood  ratio  (LR),  Rao's  (1948)  score  or  Lagrange  multiplier  (LM),  and  Wald  (W) 
statistics — under  complete  misspecification,  and  examined  the  effect  of  the  misspecification 
in  terms  of  asymptotic  relative  efficiency  (ARE). 

The  main  purpose  of  this  paper  is  to  further  analyze  the  effects  of  misspecifications 
and  to  develop  a  robust  procedure  for  one-directional  specification  testing  with  misspeci- 
fied  alternatives.  Given  the  inherent  uncertainty  about  the  true  DGP,  it  would  seem  that 
the  possibilities  of  alternative  hypothesis  being  completely  misspecified  or  underspecified 
are  equally  likely  in  one-directional  testing  situation.  We  allow  both  types  of  misspeci- 
fication by  introducing  a  nuisance  parameter  which  contaminates  the  null  and  non-null 
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distributions  of  the  test.  Our  suggested  procedure  will  then  be  shown  to  possess  correct 
asymptotic  size  in  the  presence  of  the  nuisance  parameter.  We  shall  also  show  that  the 
new  test  statistic  is  asymptotically  equivalent  to  Neyman's  (1959)  C{a)  test  and  hence 
locally  asymptotically  optimal. 

In  the  next  section  we  consider  Haavelmo's  (1944)  example  under  a  general  setup  and 
demonstrate  the  non-robustness  of  a  significance  test.  Section  3  discusses  the  asymptotic 
distributions  of  one-directional  and  multi-directional  LM  tests  under  the  three  kinds  of 
misspecification.  Next,  in  Section  4,  using  these  distributions,  we  suggest  a  robust  speci- 
fication test  under  misspecified  alternatives.  Section  5  establishes  the  connection  between 
our  procedure  and  the  C(a)  test.  In  Section  6  we  present  two  applications  of  our  proce- 
dure to  the  linear  regression  model  and  one  to  the  heterogeneous  Weibull  model.  ARE  of 
misspecified  tests  is  also  evaluated  to  further  examine  the  effects  of  misspecifications.  The 
final  Section  7  provides  some  concluding  remarks.  Some  derivations  are  contained  in  an 
appendix. 

2.  Haavelmo's  Example:  A  Simple  Problem  of  Trend  Fitting 
Haavelmo  considered  the  following  model  (pp.  75-81) 

yt  =b  +  kt  +  et  (*=  1,2,...,  AT), 

E(yt)  =  b  +  kt, 
J0(e«)  =  O,  E(et2)  =  a\ 

P{ijt)  =  e    T7^y<  '  , 


Z7TO 


where  a2  is  assumed  to  be  known.    Let  H0  :  k  =  0  be  the  hypothesis  to  be  tested.    The 
following  joint  probability  specifies  Q°,  the  set  of  admissible  hypotheses: 

(2-D  P(yi,y*,...,VN)=,  J-      «-*£."■-'-"'• 

(\/27rcr)yv 


with  — oo  <  k  <  oo  and  — oo  <  b  <  oo,  and  under  Hq,  this  reduces  to  w°,  namely 
(2-2)  P(i/i,y2,--.,yN)=         *        e~^£>'6)2. 

(  V27T(j)iv 
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Using  standard  notation,  the  test  will  be  based  on 

EC  - 1)2 

which  has  the  following  sampling  distribution 


k~*<fc.£crH)j)- 


The  critical  region  for  this  test  is 


(2.3) 


>  1.96 


$(1.96-  -)-$(-1.96-  - 
s  s 


at  5%  level  of  significance.  The  power  function,  /?(&),  can  then  be  written  as 
(2.4)  /?(*)  =  1  - 

where  $(•)   is   the  distribution  function  of  the  standard  normal  distribution  and  5    = 

y/aVEC"')2- 

Haavelmo  then  perturbed  the  priori  admissible  hypotheses  fi°  by  making  et  dependent, 

more  specifically, 


(2.5) 


u  =  77^(^-1  +  I/') 


where  ut  ~  IIDAf(0,a2),  and  studied  the  changes  in  the  power  function  /3(k).  However, 
under  this  setup  correlation  coefficient  between  et  and  et-\  is  fixed  at  1/2.  More  generally 
let  us  assume 


(2.6) 


et  =  pet-i  +  vt 


with  \p\  <  1  and  ut  ~  IID,\f(0,a2).  The  sampling  distribution  of  k  with  e<'s  following  the 
AR{1)  of  (2.6)  is  given  by 


k~tf[k, 


a 


Ut-i)2 


C 


where 


r  —  ————  I  1  -I - 

"(1-P2)|         Z.U-i 


2j2(t-W-i)Pi''-') 


I      t<f 


At  5%  significance  level,  the  new  power  fuction,  (3'(k\p)  say,  of  the  test  based  on  the  critical 
region  (2.3)  may  now  be  expressed  as 


(2.7)  j3\k\p)  =  1  - 


t(i*i-$)-#(-i*i-5 


where  s'  =  \/cr2C  /^2(t  —  t)2  .  Obviously  the  two  power  functions  defined  by  (2.4)  and 
(2.7)  coincide  when  p  =  0.  Using  the  same  numerical  values  for  N  and  a  as  in  Haavelmo's 
example,  we  obtain  plots  of  fi'(k\p)  for  different  values  of  p.  These  plots  are  presented 
in  Figure  1.  From  the  figure  it  is  easy  to  observe  what  happens  to  the  size  and  power  of 
the  above  test  based  on  the  set  of  the  priori  admissible  hypotheses  Q°  specified  by  (2.1) 
when  in  fact  e*'s  are  serially  correlated.  For  instance  when  p  —  .8,  the  true  type  I  error 
probability  could  be  as  high  as  .37  and  convergence  of  power  to  1  for  distant  alternatives 
is  quite  slow. 

This  example  is  very  simple  and  almost  half  a  century  old.  In  spite  of  that,  it  is  at 
the  heart  of,  what  we  believe,  the  central  problem  in  the  current  practice  of  econometric 
model  specification  tests  and  diagnostic  checks.  It  clearly  demonstrates  that  we  should 
study  the  properties  of  our  commonly  used  one-directional  specification  tests  for  certain 
alternatives  not  contained  in  the  priori  admissible  hypotheses  Q°,  since  it  is  quite  possible 
that  a  certain  outside  scheme  is  the  true  one  having  serious  consequences  for  our  inference 
[Haavelmo  (p.  81)]. 

3.  Distribution  of  Tests  under  Misspecification 

In  the  previous  section,  we  discussed  behavior  of  test  statistics  when  the  model  is 
misspecified  using  an  example.  We  now  set  up  a  general  theoretical  framework  and  study 
the  distribution  of  tests  under  misspecification  analytically.  We  mainly  concentrate  on  the 
LM  test.  However,  our  analysis  could  be  extended  to  the  LR  and  W  tests. 

Consider  a  general  statistical  model  represented  by  the  log-likelihood  £(7,  ip,  </>)  where 
7,  ?/>,  and  </>  are  parameter  vectors  with  dimensions  (m  x  1),  (r  x  1),  and  (s  x  1)  respec- 
tively. We  shall  follow  Saikkonen's  (1989)  notation  whenever  possible.  Suppose  that  one's 
primary  interest  is  in  model  diagnostics  or  in  specification  search  such  that  Lq(j)  is  the 
null  model  with  possible  alternatives,  Li{-y,ip),  L2{i,<f>),  and  £(7,  0,  </>).  Let  us  assume,  as 
in  Saikkonen  (19S9),  that  the  following  relations  are  true:  £0(7)  =  £1(7,1/'*)  =  £2(7,  <f>*)\ 


CO  -^  CD  q 

o  cd  O  , 

II  II  II  II 

o  o  o  o 

■E  •£  -*-  -c 


OH 


8*0 


9'0  fr"0 

d3MOd 


3*0 


CD 


CM 
O 


O 
O 


CM 
CD 


o 


4: 


O'O 


Li{j,ip)  =  £(7,0,  </>*);  and  £2(7,^)  =  -k(7i0*?^)>  where  0*  and  0*  are  known  paramter 
values. 

Let  us  now  focus  on  the  one-directional  test  for  Ho  :  0  =  0*  in  the  alternative  model 
Li(7,0)  ignoring  the  nuisance  parameter  <f>.  Typically  0*  =  0,  representing  zero  restric- 
tion, and  the  MLE  of  7  under  H0,  7  say,  is  readily  available.  In  this  situation  the  LM  test 
is  the  preferred  approach,  and  it  is,  in  fact,  locally  optimal  if  the  alternative  correctly  rep- 
resents the  DGP.  Let  LM^  be  the  LM  test  statistic  for  H0.  Let  us  denote  9  =  (7',0',  <£')', 
and  9  =  (7',  0*5  0'*)'-  Since  £1(7,0)  =  £(7,  0,^>*),  we  can  express  the  score  vector  and  the 
information  matrix  needed  for  LM^  conveniently  using  9  and  L{9).  Imposing  the  stan- 
dard regularity  conditions  on  L{9)  [see  Saikkonen  (1989)  and  the  references  cited  therin], 
let  d.^9)  =  dL(9)/d*p  and  J{9)  =  ~p\\m{N-1d2L(9)/d9d9').  LM^  for  testing  H0  based 
on  £1(7,0)  can  now  be  written  as 

(3-1)  LM^  =  IcW?)' J^(9)d^9) 

where  J^(8)  =  J ^{9) -J r^(9) J~l {9) J '70(0),  J^{9)  =  J^{9)  =  -pYim(N-ld2L{9)/d^d^'), 
and  J,/,7(#)  =  —pYim(N~1d2L(9)/dipd~f'),  etc.    Given  correct  specification,  LM^  has  well 
known  asymptotic  distributions  under  the  null  and  a  sequence  of  local  alternatives.  This 
may  be  summarized  as  follows: 

Case  1.    Correct  Specification 

Consider  testing  H0  :  0  =  0„  in  L1(/y,xj;)  where  £1(7, 0)  represents  the  true  model.  Under 

(3.2)  LM^  -^  v2r(0). 

Under  H 1  :  0  =  0*  +  C/v7^, 

(3-3)  LM*  -^Xr(Ai) 

where  \,  =  Ax(0  =  Z'J^Z  and  £  ^  0. 

We  use  ►  to  denote  convergence  in  distribution  and  Xr(^i)  stands  for  the  non-central 

chi-square  distribution  with  r  degrees  of  freedom  and  non-centrality  parameter  Ai.    Note 


also  that  the  argument  of  J^.7  is  suppressed  such  that  J  =  «/(#*)  where  8*  =  (y'Q,  i// ,  <^)' 
with  70  denoting  the  true  value  of  7. 

Now  we  shift  the  attention  to  the  three  types  of  misspecification,  namely,  complete 
misspecification,  underspecification,  and  overspecification.  Let  us  first  consider  complete 
misspecification.  Suppose  the  true  log-likelihood  function  is  £2(7,^),  so  that  the  alter- 
native Li(7,t/>)  becomes  misspecified  completely.  Using  the  sequence  of  local  DGP's 
(f)  =  <£„  +  6/y/N  (S  ^  0),  Davidson  and  MacKinnon  (1987)  and  Saikkonen  (1989) 
obtained  the  asymptotic  distribution  of  LM^  under  £2(7,  0).  The  result  may  be  stated 
as: 

CASE  2.    Complete  Misspecification 

Consider  testing  Hq  :  ip  =  ?/>*  in  £1(7,  0)  where  £2(7*  <t>)  represents  the  true  model.  Under 
L2(7,  4>)  with  <t>  =  <j>*  +  6/y/N, 

(3-4)  LMlP-^xl(\2) 

where  X2  =  A2(<5)  —  <*>  J^-i^rb-^^-i^  and  J^^.-y  =  J^^  —  J^yJy    Jy^  =  J^^.-y 

With  this  asymptotic  distribution  of  the  misspecified  test  LM^,,  the  above  authors  inves- 
tigated the  power  properties  of  LM^  in  the  direction  of  the  <f>  parameter.  In  particular, 
Saikkonen  (19S9)  explicitly  computed  the  ARE  of  LM^  with  respect  to  the  optimal  test 
(based  on  the  true  model),  LM<p  say,  for  the  general  case,  r  ^  s.  For  our  present  purpose, 
we  would  like  to  view  the  above  result  from  a  different  angle.  Note  that  the  asymptotic  dis- 
tribution of  LM^  was  obtained  under  £2(7,  <t>)  which  is  in  fact  L(~f,  0*,  4>)  by  construction. 
Thus,  one  can  interpret  ^2(7,  0)  as  the  log-likelihood  of  the  "null"  model  "contaminated" 
locally  by  the  nuisance  parameter  <j>  =  d*  +  6 / \JN .  In  other  words,  the  situation  can  be 
treated  as  hypothesis  testing  in  the  presence  of  nuisance  parameter.  An  immediate  effect 
of  this  parameter  is  that  even  asymptotically  the  size  of  the  test,  as  it  is  apparent  from 
the  non-centrality  parameter  A2,  is  not  correct,  unless  6(^  0)  belongs  to  the  null  space  of 
</t/,0-7  or  Jip^.-y  itself  is  zero. 

Turning  now  to  the  case  of  underspecification,  let  the  true  model  be  represented  by 
the  log-likelihood  .£(7, ip,<f>).  The  alternative  £1(7,  t/0  is  now  underspecified  with  respect 
to  the  nuisance  parameter  <p,  leading  to  the  problem  of  undertesting.  Saikkonen  (19S9)  did 


not  consider  this  case  explicitly.  In  order  to  derive  the  asymptotic  distribution  of  LM^ 
under  the  true  model  £(7,  t/>,  (j>)  we  again  consider  the  local  departures  <f>  =  <f>*  +  S/y/N 
together  with  ip  =  ij>m  +  £/\/~N.  We  obtain  the  following  result: 

Case  3.    Underspecification 

Suppose  we  want  to  test  H0  :  ^  =  ip*  in  Li(y,tp)  where  L(i ,  tfr ,  <f>)  represents  the  true 
model.  Under  £(7,  t/>,  <f>)  with  if>  =  </>,  +  €/VN  and  <f>  =  0*  +  6/>/N, 

(3.5)  LM*  -^  Xr(A3) 

where 


See  the  appendix  for  a  derivation.  To  interpret  the  result  note  that  the  true  model 
£(7,  t/>,  <p)  is  to  be  seen  as  a  "non-null"  model  in  which  departures  from  the  null  hypothesis 
are  two-directional.  This  is  reflected  on  the  non-centrality  parameter  A3  (<f ,  ^)  which  is  a 
function  of  both  £  and  6.  It  may  also  be  of  interest  to  observe  that  should  6  lie  in  the 
direction  of  the  null  space  of  J^^f,  or  J^0.7  is  a  null  matrix,  ^3(^6)  reduces  Aj(£)  which 
is  the  non-centrality  parameter  of  the  asymptotic  non-null  distribution  associated  with  the 
optimal  test  as  in  (3.3).  Using  our  result,  one  may  wish  to  compare  the  asymptotic  local 
power  of  the  underspecified  test  with  that  of  the  optimal  test.  It  turns  out  that  the  con- 
taminated non-centrality  parameter  A3 ((f ,  ^)  may  actually  increase  or  decrease  the  power 
depending  on  the  configuration  of  the  term  £' '  J^^.^8. 

So  far,  we  have  concentrated  on  one-directional  testing,  and  discussed  the  two  types  of 
misspecification  commonly  associated  with  it.  Let  us  finally  move  on  to  overspecification. 
As  indicated  before,  this  leads  to  the  problem  of  overtesting  when  multi-directional  joint 
tests  are  applied  based  on  an  overstated  alternative  model.  Suppose  we  apply  a  joint 
test  for  testing  hypothesis  of  the  form  H0  :  ift  =  \b*  and  0  =  <f>*  using  the  alternative 
model  L{~  .  v.  6).  Let  LM^  be  the  joint  LM  test  statistic  for  H0.  To  find  the  asymptotic 
distribution  of  LM-^^  under  overspecification,  i.e.,  when  the  DGP  is  represented  by  the 


likelihood  either  Li(j,ifi)  or  £2(7,  <f>),  we  recall  the  following  well  known  result.  Assuming 
correct  specification,  i.e.,  under  the  true  model  represented  by  £(7,  r/>,0)  with  i\)  =  t/>*  + 
Z/y/N  and  <f>  =  <j>*  +  6/^, 


(3.6) 


D 


^M#  >Xr+sM 


where 


A4  =  A4(^,^)  =  [^'     6'} 


Using  this  fact,  we  can  easily  find  the  asymptotic  distribution  of  overspecified  test  as 
follows: 

CASE  4.    Overspecification 

Consider  testing  H0  :  ip  —  *P*  and  <j>  =  <f>*  in  L(~f ,  ij> ,  <f>)  where  Li(j,tp)  represents  the  true 
model.  Under  1^(7,  z/>)  with  ip  —  ip*  +  £/wV\  we  obtain  by  setting  <5  =  0  in  (3.6) 


LM. 


>   >  Xr+aC^) 


where  A5  =  A5(f )  =  f'J^.7f. 

Note  that  the  non-centrality  parameter  As(£)  of  overspecified  test  is  identical  to  Ai(£)  of 
optimal  test  LM^  in  (3.3).  Although  A5  =  A1?  some  loss  of  power  is  to  be  expected,  as 
shown  in  Das  Gupta  and  Perlman  (1974),  due  to  the  higher  degrees  of  freedom  of  the  joint 
test  LM^tf,.  Notice  also  that  should  the  DGP  be  represented  by  £2(75  4>)->  the  non-centrality 
parameter  of  the  overspecified  test  would  be  6'  J^.^S. 


4.  Formulation  of  a  Robust  Test 


At  the  outset  we  should  mention  that  the  word  "robust"  is  used  here  in  a  limited  sense. 
We  are  particularly  interested  in  developing  a  test  that  has  correct  size  asymptotically  and 
some  optimal  properties.  In  other  words,  our  aim  is  to  construct  a  size-resistant  test.  In 
the  language  of  Stein  (1956),  we  can  also  call  it  an  "adaptive"  test,  where  we  adapt  the 
statistic  for  the  nuisance  parameter.  As  we  have  seen  in  the  previous  section,  the  presence 
of  a  nuisance  parameter  contaminates  the  asymptotic  null  and  non-null  distributions  of 
the  one-directional  test  LMy,  having  incorrect  size  and,  presumably,  suboptimal  power. 


Moreover,  the  uncertainty  about  the  directions  in  which  the  alternative  hypothesis  may 
deviate  from  the  DGP  makes  it  important  to  guard  against  both  complete  misspecification 
and  underspecification  in  one-directional  testing  situation.  This  motivates  a  modified  test 
procedure  that  would  guarantee,  at  least  asymptotically,  the  correct  size  and  the  optimal 
power  under  possible  contamination  by  the  nuisance  parameter. 

We  shall  start  with  the  asymptotic  null  distribution  of  the  score  vector  d^{9)  contam- 
inated by  the  nuisance  parameter  <f>  under  the  complete  misspecification  stated  in  Case 
2.  Following  Davidson  and  MacKinnon  (1987)  and  Saikkonen  (1989)  it  is  seen  that  under 


d^{9)  — ►  AfiJrp^.yS,  J^.-f). 


(4.1) 


Obviously,  it  is  the  nonzero  mean  J^^.y6  of  the  asymptotic  normal  distribution  of  the 
score  that  would  give  rise  to  the  non-centrality  of  the  asymptotic  null  distribution  of  LM^ 
yielding  incorrect  size  of  the  test.  Now,  a  natural  solution  to  this  problem  would  be  to 
modify  the  score  by  subtracting  the  nonzero  mean  so  that  the  resulting  quadratic  form 
would  have  central  asymptotic  chi-square  distribution.  That  is,  we  should  consider 


d^{9)  —  J^fi.yS  — >  Jv  (0,  </,/,. 7). 


(4.2) 


In  order  to  formulate  a  test  statistic  based  on  (4.2)  one  has  to  estimate  6  among  other 
things.  In  fact  this  amounts  to  estimating  (f>o  since  8  =  \ZN(4>o  —  4>*)  where  <f>o  denotes 
the  true  value  of  (p.  For  this  we  use  the  one-step  method-of-scoring  estimator,  which  is 
also  known  as  the  linearized  maximum  likelihood  estimator  [see  Schmidt  (1976,  p.  234)]. 
Let  B  =  (7',  ?/>*,  <£'„)'  be  an  initial  consistent  estimate  of  the  true  paramter  vector  90  = 
(7oi^'*i  <£())'  associated  with  the  log-likelihood  £9(7,^).  Then  the  one-step  method-of- 
scoring  estimator  (7',^')'  is 


;4.3) 


7 


+ 


N 


MV      Jl4>{9) 


-1 


M6) 


This  updating  can  be  viewed  as  an  attempt  to  "correct"  the  initial  estimators  7  and  0*  to 
take  account  of  the  local  departure  of  <j>  from  </>„.  Chesher,  Lancaster  and  Irish  (19S5,  p. 
IS)  uses  precisely  this  kind  of  Newton  algorithm  to  measure  the  effect  of  small  departures 
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from  parameter  constancy  (the  null  hypothesis)  on  the  consistency  of  MLEs  and  to  correct 
MLEs  for  the  effects  of  local  parameter  variation. 
Since  d~1{9)  =  0,  we  have 


1    -i 


(4.4)  {<t>-4>*)  =  jjJ^Wm 

where  J^.^9)  =  J<t>(0)  -  Jii>1{9)J~l{9)J1<i>(9).  Since  J<£.7(#)  =  J^.7  +  op(l),  (4.4)  can  also 
be  written  as 

(4-5)  VN(j-  <t>*)  =  Jll-^d^O) 


where  the  notation  =  means  that  the  difference  between  the  two  sides  of  the  equality 
converges  in  probability  to  zero.  Replacing  8  in  (4.2)  by  the  right  hand  side  of  (4.5)  thus 
gives 

(4-6)  ~jpd^  ~  J^^^d^e) 

and  we  need  to  find  its  asymptotic  distribution. 

The  asymptotic  distribution  of  (4.6)  can  be  derived  using  (4.1)  together  with  the  well 
known  result  that  under  1.2(7,  </>) 

(4.7)  J= d^fi)^M{J^,J^). 


Using  (4.1)  and  (4.7),  after  some  simplification  we  obtain 
(4.S)  Ttf  ^  _  J*+"iJ*\-jtfd*$) 

>  N (0,  J0.7  —  J^0.7  J^.^J^rp-y)- 

Observe  that  the  original  asymptotic  variance  of  (4.2)  has  changed  after  8  was  replaced  by 
its  estimate.  The  new  asymptotic  variance  of  (4.8)  may  be  interpreted  as  the  error  sum  of 
squares  (ESS)  of  regressing  d^,  on  d<p  after  eliminating  the  linear  effect  of  d.r  Using  (4.S) 
and  the  fact  that  J{9)  —  J  +  op{\)  yields  the  following  proposition. 
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PROPOSITION.  Consider  the  statistical  model  with  log-likelihood  function  L(y,  ip,  <f>)  where 
z/>  is  the  test  parameter,  and  the  parameter  <f>  is  a  nuisance  parameter  such  that  <f>Q  = 
(f>*  +  6/y/N  with  6^0.  Define  a  test  statistic  denoted  by  LM^  as 

(4.9)  LM;  =  ±[d^(e)-Nj^(e)($-<j>.)]' 

[d+{6)-NJ^{e)$-<l>*)], 

or  alternatively, 

=  ^[dA0)-J^(0)Ji\(e)d4>(e)}' 

[<V7(#)  -  J^.f(9)J^(&)J^i/'f(fi)] 

[d+0)  -  J^&J-^we)} 

Under  the  regularity  conditions,  and  when  Hq  :  t/>  =  tpm  is  true 

(4.io)  lm;  -^  x2r(0). 

Under  the  local  alternatives  Hi  :  tp  =  tp+  +  £/\/N, 
(4-11)  LM;  -°»  X2r(A6) 

where  A6  =  A6(f)  =  f'(J</,.7  -  J^-yJ^J^-r)^- 

The  result  in  (4.10)  is  self-evident  from  the  preceding  discussion.  The  asymptotic 
non-null  distribution  of  L-A/T  stated  in  (4.11)  can  be  deduced  from  our  previous  section's 
result  summarized  in  Case  3;  See  the  appendix  for  a  proof.  The  above  Proposition  thus 
establishes  the  "robustness"  of  the  new  test  by  showing  that  LM*  has  the  same  asymptotic 
null  distribution  as  the  LM$  based  on  correct  specification  ,  thereby  producing  asymptot- 
ically correct  size  under  the  completely  misspecified  alternative  £2(7,  4>)-  It  should  also  be 
mentioned  that  LM1  is  based  on  8  =  (7',  xp'm,  <f>'m)'  circumventing  direct  estimation  of  the 
nuisance  parameter  <f>.  However,  we  do  pay  a  price  for  this  simplicity  in  estimation  and 
robustification.  Since  Aj  —  A6  =  £'  J^^.-yJT  J^.y^  >  0,  the  asymptotic  power  of  LM1  will 
be  less  than  that  of  LJM^,  when  there  is  no  misspecification.  The  above  quantity  can  be 
viewed  as  the  cost  of  robustification. 

12 


Now  note  that  LM^  =  LM1  when  J^,^.7  =  0.  With  the  interpretation  of  the  matrix 
Jtp<j>.-f  —  Jxp4>  —  Jxfj-yJ^1  Jy<j>  as  the  partial  covariance  between  d^  and  d^  after  eliminating 
the  linear  effect  of  d7  on  d^  and  d^  [see  Anderson  (1984,  p.  36)],  it  is  easy  to  see  that 
the  "zero"  partial  covariance  between  the  two  scores  implies  the  block  diagonality  of  the 
inverse  of  the  appropriate  information  matrix  for  the  joint  LM  test,  LM^^  say,  for  testing 
Hq  :  i\)  =  V>*  and  <f>  —  <j>*.  Then  the  necessary  and  sufficient  condition  for  the  additivity  of 
the  LM  test  is  satisfied,  i.e.,  LM^^  =  LM^  +  LM<f,  [see  Bera  and  McKenzie  (19S7)].  Also, 
as  noted  before,  in  this  case  A2  in  (3.4)  vanishes.  In  other  words,  for  some  class  of  models 
where  J^.-y  =  0,  the  conventional  one-directional  test  LM^  would  be  asymptotically 
valid  even  in  the  presence  of  a  nuisance  parameter  </>,  due  to  the  asymptotic  independence 
between  the  corresponding  scores.  In  the  context  of  adaptive  estimation  Stein  (1956,  p. 
1S9)  also  stated  the  condition  J^j,.-,  =  0,  and  he  interpreted  it  as  that  the  addition  of  <f> 
as  an  unknown  parameter  does  not  make  the  problem  of  estimating  ip  asymptotically  any 
more  difficult. 

5.  Connection  with  the  C{a)  Test 

A  striking  feature  of  LM1  is  its  strong  resemblance  to  asymptotically  optimal  Ney- 
man's  (1959)  C(a)  test  of  a  composite  statistical  hypothesis  involving  unknown  nuisance 
parameter.  Although  optimality  of  the  test  has  not  been  adressed  in  the  previous  section, 
below  we  shall  prove  the  asymptotic  optimality  of  our  test  by  establishing  its  asymptotic 
equivalence  with  the  C(a)  test. 

Let  us  partition  the  parameter  vector  6  =  (7',  ?/>',  </>')'  such  that  6  =  {0'1,6'2y  where 
#1  =  (7',  <£>')'  and  82  =  ?/>•  Using  standard  notation,  the  optimal  C(a)  test  statistic  for 
testing  Hq  :  62  —  #20  (i-e-i  d'  —  ip*)  can  be  written  as  [see  Neyman  (1959)] 

(5.i)  .   C(a)  =  ^[d2(e)-j2i(e)j-1(e)d1(e)]' 

[J22W-J2i{0)j-l\e)J12(e)}-1 

[d2(0)  -  J2i(8)Jn(0W6)] 

where  6  =  (7',  ipl,  <f>  )'  denotes  a  viV-consistent  estimator  for  6  under  Hq.  It  is  well  known 
that  under  Hq 

(5.2)  C(a)  -°>  X2r(0), 
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and  under  the  local  alternatives  Hi  :  62  —  020  +  £/ v  N 


(5.3) 


C(a)  -°+  X2r(A) 


with  A  =  A(£)  =  {'J^.if  where  J2.\  =  J22  _  J21J11  J\2- 

Let  us  first  consider  the  part 
(5.4) 

<*2(3)  "  J2i(0)Jn1(0)d1(9)  =  d^e)  -  [J07(0)     Jm(9) 


MB)      Jl4>{9) 

J<t>-f(&)     J<t>{®) 


-1 


Under  the  assumption  0O  =  <t>*  +  6/viV"  we  can  replace  0  by  9  noting  d7(0)  =  0.  Applying 
the  partitioned  inverse  matrix  formula  gives 


(5.5) 


d2(O)-J21(e)J-1(0)dl(0) 


Next  the  variance  part  of  the  C(a)  statistic  is 
(5-6) 

J22(9)  -  J2i(0)Jn1WJi2(O)  =  JAO)  -[J+r@)     J++@)] 


J-f(0)      J-1<i){9) 

[j^(d)    ue) 


-1 


M\i) 


Substitution  of  6  for  9  using  the  partitioned  inverse  matrix  formula  yields 


(5.7) 


J22(0)-J2i(e)Jn1(0)J12(0) 


Also  note  that  the  non-centrality  parameter  A(£)  in  (5.3)  is  identical  to  A6(£ )  of  the  LM1 
in  (4.11).  Therefore  it  follows  that  under  the  null  and  the  local  alternatives 

lm;  =  c\a). 

We  have  thus  established  the  asymptotic  equivalence  between  the  C(a)  test  and  the  LM1. 
Being  a  variant  of  the  C(a)  test,  LM1  is  optimal  and,  by  construction,  it  is  also  robust,  at 
least  asymptotically,  against  not  only  complete  misspecification  but  underspecification  as 
well.  Furthermore,  since  LM1  is  based  on  9  =  (7',  ?/>!,  </>',)',  it  would  be  easier  to  calculate 
in  practice.    This  will  be  clear  from  our  third  application  in  the  next  section  where  we 
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note  that  sometimes  it  might  be  difficult  to  compute  the  C(a)  test.  Finally,  using  the 
terminology  in  Hall  and  Mathiason  (1990,  p.  88),  we  can  claim  LM^  to  be  regular  in 
the  sense  that  its  distribution  does  not  depend  on  the  value  of  <5,  but  for  any  rj>  =£  ip* 
the  distribution  of  LM1  differs  from  the  null  distribution.  Their  condition  for  asymptotic 
efficiency  is  also  satisfied  because  of  the  above  asymptotic  equivalence. 


6.  Applications 

Application  6.1:  Testing  in  Linear  Regression  Model 

Our  first  application  is  a  simple  one  and  it  is  concerned  with  testing  in  linear  regression 
model  framework  considered  by  Davidson  and  MacKinnon  (1985,  1987).  Suppose  we  are 
interested  in  testing  Hq  :  t/>  =  0  in  the  linear  model 


(6.1.1) 


y\XtZ,V~  M{X1  +  Z</>,  la2) 


whereas  the  local  DGP  belongs  to 


(6.1.2) 


y\  X,  Z,  V  ~  M{X1  +  V<j>,  la2) 


with  (fro  =  6/ '\/N .  Here  y  is  a  (N  x  1)  vector  of  observations  on  a  dependent  variable,  X,  Z, 
and  V  are  (N  x  m),  (N  x  r),  and  (N  x  5)  matrices  of  observations  on  independent  variables 
respectively.  Because  of  the  block-diagonality  of  the  information  matrix  involving  a2  and 
the  rest  of  the  parameters,  it  is  sufficient  to  consider  the  information  matrix  evaluated  at 
0.=(7o,^'J'  =  (7o,O',O')',i.e., 


(6.1.3) 

with 

(6.1.4) 

(6.1.5) 

and 

(6.1.6) 


.7 


1 


Na2 


X'X     X'Z     X'V 

Z'X      Z'Z      Z'V 
V'X     V'Z     V'V 


•J  lp-~f 


1 


Na2 

1 
iV>2 


Z'MXV, 
Z'MXZ, 


J(t>t  — 


N^V'M<V 
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where  Mx  is  the  projection  matrix  /  —  X(X'X)~1X' .  As  in  Davidson  and  MacKinnon 
(1985,  1987)  the  non-robust  LM  test  for  H0  :  xp  =  0  based  on  the  completly  misspecified 
alternative  (6.1.1)  can  be  written  as 

(6.1.7)  LM*  =  ±ry'MsZ{Z'MtZ)-lZ'Mty 

(7 

where  a2  is  the  OLS  estimate  from  the  null  model.  It  is  easily  seen  from  (6.1.4)  and  (6.1.5) 
that  the  non-centrality  paramter  of  the  asymptotic  distribution  of  (6.1.7)  when  the  data 
are  generated  by  (6.1.2)  is 

(6.1.8)  \o(S)  =  -^—8'V  MXZ(Z'  MXZ)~1  Z'  MXVS. 

[See  Davidson  and  MacKinnon  (1985,  1987)  for  a  geometric  interpretation  of  this  non- 
centrality  parameter  (6.1.8)].  Since  the  above  non-centrality  parameter  deviates  from  zero, 
the  standard  LM  test  based  on  (6.1.1)  would  induce  incorrect  inference.  The  asymptotically 
robust  test  (4.9)  can  be  readily  obtained  as 

(6.1.9)  LM;  =  ^  [y'MxZ  -  y' MXV{V  MxV)-lV'MxZ] 

[Z'MXZ  -  Z'MXV(V'MXV)-1V'MXZ]~1 
[Z'Mxy  -  Z'M.ViV'M.Vy'V'hUy] 

It  can  also  be  shown  that  the  C(a)  test  for  testing  ip  =  0  in  y  =  Xj  +  Zip  -f  V <p  +  u,  and 
LM^  have  the  same  algebraic  form. 


Application  G.2:  Testing  for  Autocorrelation  in  the  Presence  of  Lagged  Dependent  Variable 
We  consider  a  simplified  version  of  the  example  due  to  Durbin  (1970).    Suppose  we 


have  the  following  regression  model 


(6-2.1)  yt  =  <f>yt-i  +x'a  +  ut        (*  =  l,2,...,iV), 

ut  =  iput-\  +  eu  \ip\  <  1, 

et  ~  IIDj\f(0,a2), 

16 


where  xjq  is  given  and  uq  is  assumed  to  be  fixed.  Here  0  and  <f>  are  scalar  parameters  and 
xt  is  a  vector  of  fixed  regressors.  We  are  concerned  with  the  problem  of  testing  H0  :  ip  =  0 
in  the  presence  of  the  nuisance  parameter  <f>.  As  in  the  application  6.1,  we  need  only  to 
consider  the  scores  and  the  information  matrix  for  the  parameter  vector  6  =  (7' ,  ip ,  <f>)' 
evaluated  at  6*  =  (y'0,  V>*,0*)'  =  (70, 0,0)'  because  of  the  block-diagonality  of  a2  and  the 
other  parameters  of  the  model.  We  have 

eL  =  — -I'u, 

,  1     , 

dw,  =  — u  u_i, 


cL  = 


(6.2.2) 


J  =  plim 


N<72 


X'X        0        A'Vi 

0         A^a2        A^a2 

y'^X     Na2     yLiy-i 


(6.2.3; 

(6.2.4' 


and 


(6.2.5) 


J^ 


<£'7 


plim 


N 


,T- 


{yLxy-i-yLx^Jr'A-J-^Vi} 


where  u  =  (ui,u2l...  ,  u/v)',U-i  =  (u0,ui,.  .  .  ,ujv-i)',y-i  =  ( Z/o ,  2/ 1 ,  ■  •  •  ,yjv-i)',  and  A'  = 
(.Ti ,  a: 2.1  ■  •  •  i  x n)' ■  Since  J^.7  7^  0  indicating  the  asymptotic  correlation  between  the  scores 
dxi,  and  (1$,  the  Durbin- Watson  test  is  not  valid  asymptotically  as  discussed  in  Nerlove  and 
Wallis  (1966).  In  this  situation  the  asymptotically  robust  test  (4.9)  can  be  readily  obtained 


a.s 


(6.2.6)   lm; 


N  [u'u.^u'u)-1  -  {yL.y-i  -y'^XiX'Xr'X'y^} 
1  -  u'u  {yLjy-i  -  ?/_!  A(A'A)-i  A'y_:  }_1 


-1  ~, 


u  y-i 
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where  "~"  indicates  that  the  quantities  have  been  evaluated  at  9  =  (7',  0,0)'  so  that  u  are 
the  OLS  residuals  from  the  regression  yt  =  x'ty  +  ut. 

Let  us  compare  the  LM^  in  (6.2.6)  with  Durbin's  (1970)  h  statistic  and  the  Durbin- 
Watson  statistic.  Under  Ho  :  */>  =  0,  the  MLE  of  7  and  <f>  are  given  by  the  OLS  estimates, 
7  and  <f>  say.  If  u  are  the  OLS  residuals  from  the  regression  yt  =  <f>yt-i  +x'ty  +  ut,  the  MLE 
of  ip  given  7  and  <j>  is  if)  =  u'u_i(w'_iW-i)-1  =  N~1drp.  Note  also  that  0^(7,  0,  <?!>)  =  0. 
Since  J71  =  (J4,  —  J^-yJ-^1-/-^)-1  equals  N  times  the  asymptotic  variance  of  0,  estimating 
this  quantity  by  J^  =  u'u  {yLiV-i  —  y'_lX(X'X)~1  X'y-i  }  thus  gives  the  Durbin  h 
statistic 

(6.2.7)  h2  =  d* 


By  setting  4>  =  0,  the  Durbin- Watson  statistic  can  also  be  written  asymptotically  as 

(6.2.8)  DW=-2- 

From  (6.2.6)  and  (6.2.8)  it  is  clearly  seen  that  the  LM^  modifies  the  Durbin- Watson 
statistic  by  adjusting  the  mean  and  variance  of  the  score  for  the  asymptotic  correlation 
between  d^  and  d^.  The  Durbin  h  statistic  in  (6.2.7)  may  be  interpreted  similarly.  But  it 
has  no  equivalent  to  the  mean  correction  factor  Jl\d^  of  LM1  since  Durbin  estimates  <f> 
by  (j)  for  which  d^  =  0. 

Computational  advantage  of  our  procedure  becomes  apparent  when  one  notes  that 
Durbin's  (1970)  procedure  cannot  be  easily  implemented  if  one's  interest  is  in  testing 
H0  :  o  =  0  in  the  model  (6.2.1).  In  this  case  application  of  Durbin's  procedure  would 
require  the  MLE  of  7  and  0  under  #0,  and  the  OLS  estimates  are  inappropriate  for  this 
purpose.  However,  LM1  can  be  computed  easily  using  the  standard  OLS  method  to  yield 

(6.2.9)  LM1  =    {d«Sd^2 

N(J^-l) 

N  [u'y-^u'u)-1  -  u'Z^u'Z)-1}2 
~  (u'u)-1  {y'sj-i  -  y'^XiX'XyKX'y-i }  -  1 

where,  as  in  LM*t,  u  are  the  OLS  residuals  from  the  regression  yt  =  x't-y  +  u(. 
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Application  6.3:  Testing  for  Heterogeneity  and  Duration  Dependence 

In  specification  diagnostics  for  econometric  duration  models,  considerable  attention 
has  been  paid  to  testing  unobserved  heterogeneity  and  duration  dependence  [see,  for  ex- 
ample, Lancaster  (1985),  Kiefer  (1984),  and  Jensen  (1986)].  Also,  it  has  been  recognized, 
as  emphasized  in  Kiefer 's  (1988,  pp.  671-672)  recent  survey,  that  heterogeneity  in  du- 
ration models  leads  to  misleading  inferences  about  duration  dependence,  and  in  fact  the 
former  may  induce  the  latter  negatively.  Therefore,  the  available  one-directional  tests  as 
proposed  by  the  above  authors  will  not  be  valid  in  the  presence  of  possible  misspecifica- 
tions.  Simulation  results  reported  in  a  recent  paper  by  Jaggia  and  Trivedi  (1990)  also 
highlight  the  inappropriateness  of  such  one-directional  tests.  We  apply  our  modified  LM 
test  (4.9)  to  obtain  tests  for  unobserved  heterogeneity  (or  duration  dependence)  which  are 
asymptotically  valid  in  the  presence  of  duration  dependence  (or  unobserved  heterogeneity). 

Following  Lancaster  (19S5),  the  approximate  density  function,  g(t),  of  a  locally  het- 
erogeneous Weibull  model  for  small  a2,  the  variance  of  the  heterogeneity  term,  is  given 

by 

(6.3.1)  ^)  =  /(0Jl  +  y(62-26)j 

where  f(t)  is  the  Weibull  probability  density  function  with  no  heterogeneity,  i.e., 

f{t)  =  c^-V'^exp  i-taex'p\  . 

The  parameter  a  represents  duration  dependence  of  the  hazard  function.  The  hazard 
function  is  increasing  in  duration  (positive  duration  dependence)  if  a  >  1,  and  is  decreasing 
(negative  duration  dependence)  if  a  <  1.  By  setting  a  =  1  the  exponential  distribution 
is  obtained,  for  which  the  hazard  function  is  constant  (no  duration  dependence).  And  e 
is  the  generalized  error  of  the  Weibull  model  in  the  sense  of  Cox  and  Snell  [see  Lancaster 
(1985)],  i.e., 

(6.3.2)  e  =  tQexp{x'(3] 

or 

=  rexp{/Jo+.r;/ii} 
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assuming  that  E(xi)  =  0. 

Now  consider  testing  for  neglected  heterogeneity  Hq  :  a2  =  0  in  the  presence  of 
duration  dependence.  For  this  the  LMT  can  be  constructed  as  follows.  We  set  7  =  (3  = 
(/?o,/?i)',  0  =  °2 1  and  <j>  =  oc,  where  /3\  is  a  (A:  —  1)  x  1  vector  and  /?o,  c2,  and  a  are  scalar 
parameters.  Note  that  here  r  =  s  =  1.  From  (6.3.1),  the  log-density  function  for  the  i-th 
observation  is 

(6.3.3)  ll(e)  =  \ng(tl) 

=  lna  +  (a-l)ln*i  +  0o  +  AiPi 

-6l  +  ln|l  +  y(e2-2ei)} 
where  tl  denotes  complete  duration  spells  and  xu  is  a  vector  of  explanatory  variables  for 
i  =  l,...,iV.  Since  LM^,  is  to  be  evaluated  at  0  =  (7',^*,  <^*)'  —  (A),/?ii0, 1)',  the  scores 
evaluated  at  0,  =  (/5o ,  /^i ,  0, 1)'  and  expressed  in  terms  of  the  generalized  error  (6.3.2)  are 
given  by 

£,(i-^) 

.Z),a;it(l  -  et) 


d.,  = 


Using  the  fact  that  E(eJ)  =  j\  for  j  =  1,2,...,  and  some  properties  of  the  polygamma 

functions  [see,  for  example,  Jaggia  and  Trivedi  (1990)],  the  information  matrix  evaluated 

at  9*  is  given  by 

1               0'                 -1  *(2)-A, 

0                A                   0  -A/5j 

-1              0'                   2  /?o-#(2)-l 

tf(2)-0o  -flA  /?o  -  ^(2)  -  1  J* 


(6.3.4 
where 


J 


J0  =  l  +  *'(2)  +  (*(2))2  +  /?02  +  fl  Aft  -  2*  (2)/?0 
and  A   =  £'(x1,.r/ll).     $(•)  and  ^'(-)  denote  the  digamma  and  the  trigamma  functions 
respectively.  From  (6.3.4)  some  computation  yields 

(6.3.5)  J^.7  =-1, 

(6.3.6)  -Vr  =  l, 
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an< 


(6.3.7) 


J^.7  =  #'(1)  =  1.6449 


Note  that  J^</,.7  7^  0.  This  implies  the  asymptotic  dependence  between  the  scores,  d^  and 
dj,,  invalidating  the  conventional  one- directional  tests,  LM^  and  LM^  say.  Straightforward 
calculation  gives 

2.5506 


;6.3.S) 


lm;-- 


N 


*!;«-*» 


+.6079^(1  +  (!-?*)  In  *,-} 


where  ?,-  =  tt  exp{/?o  +  x[  fij  } ,  and  fio  and  j3\  are  the  MLE's  from  the  exponential  regression 
model.  By  switching  the  role  of  the  parameters  i\>  and  (f>  it  is  also  easy  to  construct  LM* 
for  testing  no  duration  dependence  Ho  :  a  —  1  in  the  heterogeneous  Weibull  model,  as 

1.5506 


(6.3.9) 


LM1 


N 

4e< 


i 


{l  +  (l-€i)ln*i} 


2?i) 


Jaggia  and  Trivedi  (1990)  reported  some  Monte  Carlo  results  on  the  performance  of 
the  one-directional  LM  tests  of  heterogeneity  and  duration  dependence,  LM^  and  LM^ 
respectively,  the  joint  test  LM^^,  and  the  C(a)  test  of  heterogeneity  which  is  asymptoti- 
cally equivalent  to  the  LM1  in  (6.3.8).  In  the  following  discussion  we  demonstrate  that  the 
empirical  power  results  of  these  tests  obtained  by  their  simulation  experiments  could  easily 
be  anticipated  from  the  results  of  Section  3.  First  the  non-centrality  parameter  A2(£)  in 
(3.4)  of  LM^,  under  complete  misspecification  is,  from  (6.3.5)  and  (6.3.6). 

\2{8)  =  82  >  0. 

Further  the  ARE  of  the  completely  misspecifled  test  LM^  with  respect  to  the  optimal  test 
LM^  is  given  by  [see  Saikkonen  (1989)] 

J  J'1    T 


ARE 


J 


4>-~r 


=  .6097 
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which  does  not  depend  on  8.  This  implies  that  the  asymptotic  relative  powers  of  LM^ 
and  LM<t>  do  not  change  with  the  value  of  8.  In  Jaggia  and  Trivedi  (1990)  the  percentage 
rejections  for  the  completely  misspecified  test  LM^  at  5%  nominal  significance  level  were 
96.0,  95.0,  and  100.0  when  <f>(=  a)  were  set  at  .75,  1.3,  and  1.45  respecticely,  while  for 
these  cases  the  optimal  test  LMj,  had  power,  100.0,  99.6,  and  100.0.  Under  the  same  setup 
the  C(a)  test  showed  the  rejection  proportions  close  to  the  nominal  level.  We  anticipate 
our  LMT  will  have  a  similar  property. 

Given  the  results  of  Section  3,  we  can  also  easily  evaluate  the  asymptotic  powers  of 
standard  LM  tests  under  misspecifications.  Figure  2  plots  the  asymptotic  power  of  the 
completely  misspecified  LM^  for  a  range  of  values  of  4>.  Note  the  severe  distortion  of 
the  size  of  the  misspecified  test  except  for  the  case  where  <f>  =  1.  Our  computation  of 
asymptotic  power  using  the  same  sample  size  as  in  Jaggia  and  Trivedi  (1990)  yields  results 
similar  to  theirs.  For  instance,  we  obtain  the  asymptotic  power,  94.2,  98. 9,  and  100.0  for 
the  corresponding  values  of  </>,  .75,  1.3,  and  1.45  respectively. 

Next  turning  to  a  underspecification  case,  the  non-centrality  parameter  of  the  under- 
specified  LM^  in  (3.5)  is 

As(£,«)  =  A,(f)  +  A2(«)  +  2£Vw.7< 

where  Aj(£)  =  f V^.7£  =  £2,  and  hence, 

A3-'A,  =8(6-20- 

Observe  that  the  simultaneous  presence  of  negative  duration  dependence  (8  <  0)  and  het- 
erogeneity (£  >  0)  makes  A3  greater  than  A^  Jaggia  and  Trivedi  (1990)  reported  that 
underspecified  LM^  yielded  100.0  rejection  proportion  for  ip  —  .6  and  <p  —  .lb  whereas 
the  correct  LM^  (with  ip  =  .6  and  0  =  1)  achieved  99.0.  This  Monte  Carlo  evidence  is  in 
complete  agreement  with  the  above  theoretical  result.  On  the  other  hand,  underspecifica- 
tion can  also  reduce  the  power  as  argued  in  Section  3.  This  is  the  case  when  heterogeneity 
and  positive  duration  dependence  are  jointly  present  such  that  0  <  6  <  2£.  In  Jaggia  and 
Trivedi  (1990)  the  rejection  percentage  for  the  underspecified  test  LM^  was  as  low  as  15.2 
when  0  =  .G  and  <p  =  1.45  while  the  lowest  percentage  rejection  for  the  underspecified 
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8*0 


9'o  vo 
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LMff,  was  7.4  with  ip  =  .6  and  <j)  =  1.3.  Also  simultaneous  presence  of  positive  duration 
dependence  and  heterogeneity  by  "equal  amount",  i.e.,  £  =  6  can  make  the  non-centrality 
parameter  of  the  underspecified  LM^  to  be  zero.  Note  that  similar  analysis  of  the  effects 
of  positive  or  negative  duration  dependence  on  the  underspecified  test  could  also  be  done 
based  on  ARE,  since  the  ARE  of  the  underspecified  test  LM^  may  be  expressed  as 

ARE  =  r 

=  (*/£- 1)2 

Figure  3  shows  the  asymptotic  power  of  the  underspecified  test  LM^  with  ip  set  to  .6. 

As  indicated  above,  LM^  has  no  power  when  ifr  =  .6  and  <j)  =  1.6,  i.e.,  when  £  —  8  =  0. 

Jaggia  and  Trivedi  (1990)  did  not  choose  this  particular  parameter  configuration;  they  had 

(f>  values  up  to  1.45.  From  the  figure  it  is  clear  that  the  power  is  monotonic  in  (£  —  8)2 . 

Now  we  consider  overspecification.  In  Section  3  we  have  shown  that  the  non-centrality 

parameter  A5(£)  of  the  overspecified  test  LM^^  in  (3.7)  is  identical  to  Aj(£)  of  the  optimal 

test  LM^  in  (3.3),  and  hence  the  LM^  is  expected  to  be  less  powerful  than  the  LM^  due 

to  the  higher  degrees  of  freedom  associated  with  the  joint  test.    This  can  also  be  easily 

verified  by  evaluating  the  ARE  of  the  overspecified  LM^  with  respect  to  the  optimal 

LM^.    Since  A5  =  A},  the  ARE  is  obtained  simply  as  the  following  ratio  [see  Saikkonen 

(1989)] 

<i(W) 

ARE-<Z(2,«,/9) 

Here  a  and  j3  stand  for  the  nominal  significance  level  and  a  given  power,  respectively,  for 
both  tests.  d(k,a:,i3)  is  the  non-centrality  parameter  such  that  the  1  —  ft  fractile  of  the 
\~k{d)  distribution  and  the  1  —  a  fractile  of  the  x|(0)  distribution  coincide.  [For  tabulated 
values  of  d(k,a,ft),  see  Saikkonen  (19S9)  and  the  references  cited  therein].  For  a  =  .05 
and  some  different  values  of  3,  we  can  show  that  using  the  table  included  in  Saikkonen 
(19S9,  p.  359) 


(3 

.25 

.50 

.70 

.90 

ARE 

.730 

.775 

.S01 

.S30 
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As  expected,  Jaggia  and  Trivedi  (1990)  also  reported  some  loss  of  power  of  the  overspecified 
joint  test.  For  instance,  the  percentage  rejection  for  the  LM^^  with  rp  =  0  and  <f>  =  1.3 
was  98.8  whereas  the  correct  LM^  obtained  99.6. 

On  the  basis  of  our  ARE  computation  and  the  simulation  results  of  Jaggia  and  Trivedi 
(1990),  we  may  thus  conclude  that  the  effect  of  overspeciflcation  is  not  so  severe  while 
underspecification  can  lead  to  seriously  misleading  inferences  especially  when  positive  du- 
ration dependence  is  present  jointly  with  heterogeneity.  As  mentioned  in  Section  1,  similar 
conclusion  was  drawn  in  Bera  and  Jarque  (1982)  based  on  their  Monte  Carlo  results  in  the 
context  of  testing  linear  regression  model. 

Finally,  the  non-centrality  parameter  of  the  robust  LM^  is  obtained  from  (4.11)  as 


=  .3921£2 
-.3921A1(0 

Therefore,  A6(£)  <  A^f),  and  this  is  due  to  asymptotic  dependence  between  d^  and  d^  as 
expressed  by  the  quantity  J^.7  =  —  1.  One  may  view  the  resulting  loss  of  power  of  LM1 
as  a  risk  premium  associated  with  size  correction.  As  a  result  it  is  not  surprising  that  the 
C(a)  test  had  less  power  than  the  correctly  specified  LM^  in  Jaggia  and  Trivedi's  (1990) 
simulation.  Specifically,  the  percentage  rejections  for  the  C(a)  test  were  78.6,  76.2,  79.6, 
and  80.0,  for  the  corresponding  values  of  </>,  1.0,  .75,  1.3,  and  1.45  respectively,  with  ip  set 
to  .6,  whereas  the  correct  LM^  with  ?/>  =  .6  and  <f>  =  1  attained  99.0  rejection  percentage. 
Therefore,  our  analytical  results  of  Sections  3  and  4  explain  the  Monte  Carlo  evidence 
quite  satisfactorily. 

The  properties  of  LM1  can  also  be  highlighted  by  comparing  its  asymptotic  power 
with  that  of  standard  LM^,  in  the  presence  of  the  nuisance  parameter.  In  Figure  4  the 
solid  line  depicts  the  asymptotic  power  of  LM1  with  <f>  set  to  1.45  whereas  the  dotted  line 
represents  the  asymptotic  power  function  of  the  (underspecified)  test  LM^.  As  emphasized 
before,  LM1  has  the  correct  size  .05  with  power  increasing  monotonously  as  ip  deviates 
from  the  null.  On  the  other  hand,  LM^  has  size  almost  equal  to  1.0  (!)  and  a  nonmonotonic 
power  function.  Note  again  that  the  non-centrality  parameter  of  LM^  collapses  to  zero 
when  0  =  .45  with  <f>  =  1.45. 
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7.  Concluding  Remarks 

We  have  shown  our  robust  form  of  LM  test  (LM*)  is  asymptotically  equivalent  to 
Neyman's  C(a)  test;  however,  in  finite  sample  their  performance  could  be  quite  different. 
One  attractive  feature  of  LM*  is  that  it  is  simpler  than  C(a)  computationally.  Given 
the  current  state  of  computational  know-how,  that  may  not  be  a  major  advantage.  There 
are,  however,  some  practical  cases  where  C(ot)  statistic  is  very  difficult  to  compute.  For 
example,  in  our  last  application,  C(a)  statistic  for  testing  no  duration  dependence  in  the 
heterogeneity  model  is  not  easy  to  evaluate.  That  is  why  Jaggia  and  Trivedi  (1990)  did 
not  report  any  Monte  Carlo  results  for  this  case.  Our  test  statistic,  however,  has  a  very 
simple  form  as  given  in  (6.3.9). 

There  are  a  number  of  other  practical  situations  where  our  procedure  could  be  ap- 
plied. In  the  previous  section,  we  outlined  three  cases.  One  important  application  could 
be  testing  normality  and  homoskedasticity  in  the  limited  dependent  variable  models.  Here 
the  information  matrix  is  not  block  diagonal,  and  the  estimation  allowing  for  the  nuisance 
parameter  is  computationally  very  demanding.  Using  our  procedure,  it  is  possible  to  de- 
velop tests  for  normality  in  the  presence  of  heterosKedasticity  and  vice  versa.  Another 
interesting  area  of  application  is  on  specification  tests  for  autoregressive  conditional  het- 
eroskedasticity  (ARCH)  models  of  Engle  (1982).  As  indicated  in  Bera,  Lee  and  Higgins 
(1990),  there  are  some  asymmetric  ARCH  models  for  which  the  information  matrix  is  not 
block  diagonal  between  the  regression  and  ARCH  parameters.  One  consequence  of  this  is 
that  if  the  ARCH  process  is  misspecified,  we  are  going  to  get  inconsistent  estimators  for 
the  regression  coefficients  using  standard  likelihood  approach.  Also  the  conventional  t-test 
for  the  regression  parameters  is  not  valid.  Our  approach  could  be  adapted  to  this  model 
and  that  will  help  to  make  robust  inference  for  the  regression  coefficients  in  the  presence 
of  asymmetric  ARCH. 

As  in  C(a)  test,  drawbacks  of  our  approach  are  that  complete  specification  of  the 
full  model  is  required  and  we  allow  only  for  local  departures.  This  is  not  the  case  in 
Wooldridge  (1990)  procedure;  for  example,  his  suggested  test  for  conditional  mean  does 
not  require  correct  specification  of  the  conditional  variance,  and  the  departures  could  be  of 
global  nature.  However,  we  should  note  that  Wooldridge  (1990)  approach  is  not  applicable 
when  the  information  matrix  is  not  block  diagonal,  since  he  requires  consistency  of  the 
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parameter  estimates  under  misspecification.  Since  our  approach  is  very  much  in  the  spirit 
of  Neyman's  C{a)  procedure,  we  do  not  see  any  immediate  solution  to  these  two  drawbacks. 
Even  with  this  restriction,  as  we  indicated  earlier,  there  are  many  econometric  problems 
where  our  approach  could  be  applied  fruitfully. 
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Appendix 

To  derive  the  result  stated  in  the  Case  3  we  denote  the  true  parameter  vector  by 
#00  =  (7o»  iP'oi  $))'  where  tp0  =  t/>„  +  £/\/~N  and  <f>0  =  <f>m  +  6/y/N.  The  conventional  Taylor 
series  expansion  about  60Q  gives 

i  ai(fl.)      i  dL(e0Q)      i  a2L(#oo) 

(A-1}  7f"^~=7^~^r+^^^^(^-^) 

+  7w^d^(^~M  +  op{1) 

+    Jy4>0    +   J^t,    +   Op{l) 


Noting  dL{6)/d^  =  0,  similar  expansion  about  6  yields 

(A-2)  7w-^r  =  -7w^d7{1-lo)+op{l) 


=  J7\/iV(7-7o)  +  op(l) 
From  (A.l)  and  (A. 2)  we  obtain 
(A.3)  ^-^  =  ^1    1    Mj&Si  +  j-i/^ 


Another  Taylor  series  expansion  shows 


77       57 


^  1    aJD(g)         1    6>L(fl00)         1    d2L(8Q0) 


i  a2L(#oo) 

1     d2L(6Qo) 


Using  (A.3)  we  can  rewrite  (A. 4)  as 

,A  ^  !    ^(*)  1    9L(0OO)       T      7-i     1     dL(0oo) 


N    dip  ^/N      d^  W1   7    y/N      #7 

+  J7l,4>5  +  Ji,£  +  op(l). 
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Since  N~1/2dL{e00)/de  -^  JV(0,  J),  it  is  readily  seen  that 

(A. 5)  and  (A. 6)  finally  gives 

(A.7)  ^Mdl    -£+W++Tf~+J*~rt>J+-r) 

Since  JJp.1(9)  =  Jip.-j  +  op(l),  the  result  in  (3.5)  follows. 

The  asymptotic  nonnull  distribution  of  LM^  in  (4.11)  can  be  found  easily  by  noting 
that  when  H\  :  ip  =  ip+  +t,/\/N  is  true,  i.e.,  under  L(80q),  similar  argument  to  (A.1)-(A.7) 
gives 

(A. 8)  —=.— >  M(JH>-y{  +  J+'ib  J+-j) 

Using  (A.7)  and  (A. 8),  it  is  straightforward  to  see  that  the  asymptotic  distribution  of  (4.6) 
under  H\  is 


1.      .     ,*  .  ._,       1 


(A.9)  -m^{9)  -  J^J-^—d+iO) 


Since  J{9)  =  J  +  op(l),  we  have  thus  proved  the  result  in  (4.11). 
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